AITopics | corresponding text

Collaborating Authors

corresponding text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Neural Information Processing SystemsFeb-17-2026, 21:43:11 GMT

We present a novel OCR-free document understanding framework based on pre-trained Multimodal Large Language Models (MLLMs).

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

4730d10b22261faa9a95ebf7497bc556-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:55:08 GMT

generspeech, mean opinion score, visualization, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Neural Information Processing SystemsOct-10-2025, 15:23:54 GMT

We present a novel OCR-free document understanding framework based on pre-trained Multimodal Large Language Models (MLLMs).

corresponding text, dataset, image text, (12 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

4730d10b22261faa9a95ebf7497bc556-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 14:18:25 GMT

generspeech, mean opinion score, visualization, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning

Role, François, Meyer, Sébastien, Amblard, Victor

arXiv.org Artificial IntelligenceMay-7-2025

Vision-language models (VLMs) allow to embed texts and images in a shared representation space. However, it has been shown that these models are subject to a modality gap phenomenon meaning there exists a clear separation between the embeddings from one modality and another in the embedding space. While this misalignment is detrimental for downstream tasks such as multimodal retrieval, multimodal clustering or zero-shot classification, etc. no generic and practical methods have so far been proposed to assess it precisely and even reduce it. We therefore propose novel measures and effective techniques (spectral- and optimal transport-based methods) to achieve this goal. Extensive experiments conducted on several image-text datasets and models demonstrate their effectiveness and beneficial effects on downstream tasks. Our code is available at the URL provided in the paper's abstract.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.03703

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback